- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources2
- Resource Type
-
0001000001000000
- More
- Availability
-
20
- Author / Contributor
- Filter by Author / Creator
-
-
Chen, Zhifeng (2)
-
Bruno, James E. (1)
-
Chandler, Bert D. (1)
-
Dasgupta, Anish (1)
-
Dwarica, Nicolas S. (1)
-
Gonzalez, Joseph (1)
-
Guzman, Clemente S. (1)
-
Hand, Emily R. (1)
-
Huang, Yanping (1)
-
Jin, Xin (1)
-
Li, Zhuohan (1)
-
Lianmin, Zheng (1)
-
Liu, Vincent (1)
-
Rioux, Robert M. (1)
-
Sheng, Ying (1)
-
Stoica, Ion (1)
-
Whittaker, Todd N. (1)
-
Zhang, Hao (1)
-
Zhong, Yinmin (1)
-
#Tyler Phillips, Kenneth E. (0)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Model parallelism is conventionally viewed as a method to scale a single large deep learning model beyond the memory limits of a single device. In this paper, we demonstrate that model parallelism can be additionally used for the statistical multiplexing of multiple devices when serving multiple models, even when a single model can fit into a single device. Our work reveals a fundamental trade-off between the overhead introduced by model parallelism and the opportunity to exploit statistical multiplexing to reduce serving latency in the presence of bursty workloads. We explore the new trade-off space and present a novel serving system, AlpaServe, that determines an efficient strategy for placing and parallelizing collections of large deep learning models across a distributed cluster. Evaluation results on production workloads show that AlpaServe can process requests at up to 10× higher rates or 6× more burstiness while staying within latency constraints for more than 99% of requests.more » « less
-
Bruno, James E.; Dwarica, Nicolas S.; Whittaker, Todd N.; Hand, Emily R.; Guzman, Clemente S.; Dasgupta, Anish; Chen, Zhifeng; Rioux, Robert M.; Chandler, Bert D. (, ACS Catalysis)
An official website of the United States government

Full Text Available